Fisrt we read in the preprocessed data.
sceset <- readRDS("../rds/sce_cancerFT_afterQC.rds")
This dataset is after QC. You can see the QC results. There are 3877 QC-passed cells.
kable(table(sceset$description))
| Var1 | Freq |
|---|---|
| bulk | 27 |
| cell | 3877 |
| doublet | 63 |
| empty | 18 |
| failed | 858 |
Now we only keeps the QC-passed ones for the following analysis.
sceset <- sceset[,sceset$description == "cell"]
Now we start to prepare for the clustering. The clustering was performed with an in-house protocol called clincluster. In this cluster, the cells are clustering based on the condition (fresh, cryopreserved and cultured)
source("clincluster/clincluster_functions.R")
sceset@colData$source2 <- plyr::mapvalues(x = sceset$source,
from = unique(sceset@colData$source),
to = c("cryo","fresh","longC", "onC","longC"))
sceset <- PrepareData(sceset, col.for.cluster = "source2", do.scale = T)
## [1] "Normalising the data..."
## [1] "Centre the data"
Next we select the high variance gene based on the average expression and the dispersion. In total, 1258 genes were found as HVGs.
sceset <- HighVarGenes(sceset, verbose = T, mean.high.cutoff = 6, mean.low.cutoff = 0.3, dispersion.low.cutoff = 1, dispersion.high.cutoff = 7.5)
## [1] "Finding high variance genes..."
## [1] "1258 high variance genes are found."
kable(table(rowData(sceset)$high.var))
| Var1 | Freq |
|---|---|
| FALSE | 20852 |
| TRUE | 1258 |
The following plot shows the cutoffs for the HVGs selection. In subfigure (A) The x-axis is the average expression and the y-axis is the dispersion of expression scaled by the mean expression. The subfigure (B) is slightly different. the y-axis is the dispersion of expression.
df_plot <- data.frame(gene.mean = rowData(sceset)$gene.mean,
gene.dispersion.scaled = rowData(sceset)$gene.dispersion.scaled,
gene.dispersion = rowData(sceset)$gene.dispersion,
high.var = rowData(sceset)$high.var)
p1 <- ggplot(data = df_plot , aes(x = gene.mean, y = gene.dispersion.scaled, col = high.var) ) +
geom_point(alpha=0.4) + geom_hline(yintercept = 1, alpha = 0.5, col = "grey" ) +
geom_vline(xintercept = 0.3, alpha = 0.5, col = "grey" ) +
geom_vline(xintercept = 6, alpha = 0.5, col = "grey")
p2 <- ggplot(data = df_plot, aes(x = gene.mean, y = gene.dispersion, col = high.var) ) +
geom_point(alpha=0.4)
plot_grid(p1,p2, labels = "AUTO")
Next we run the tSNE calculation embeded in scater for visualisation by using the HVGs we just found.
set.seed(12345)
sceset <- runTSNE(object = sceset, ncomponents = 2, feature_set = rownames(sceset)[rowData(sceset)$high.var],
exprs_values = "logcounts",
perplexity = min(50, floor(ncol(sceset)/5)))
This plot shows you how the data looks on a 2d space.
p1 <- plotTSNE(sceset, colour_by = "source")
## Warning: 'add_ticks' is deprecated.
## Use '+ geom_rug(...)' instead.
p2 <- plotTSNE(sceset, colour_by = "Patient2")
## Warning: 'add_ticks' is deprecated.
## Use '+ geom_rug(...)' instead.
plot_grid(p1,p2,labels = "AUTO")
To reduce the dimensions for clustering, we are calculating 20 PCs from HVGs and the log-transformed counts.
sceset <- runPCA(object = sceset, ncomponents = 20,
exprs_values = "logcounts", rand_seed = 12345,
feature_set = rownames(sceset)[rowData(sceset)$high.var == TRUE])
plot(1:20, attr(sceset@reducedDims$PCA, "percentVar")[1:20])
There are four groups. We first cluster the cells within each group.
kable(table(sceset$group.for.cluster))
| Var1 | Freq |
|---|---|
| cryo | 1177 |
| fresh | 2132 |
| longC | 161 |
| onC | 407 |
The initial clustering is based on knn-spectral clustering.
sceset <- InitialCluster(sceset, k = 6, ncomponents = 1:12, n.neighbor = 7)
## [1] "Got data ready"
## [1] "Initial clustering"
##
|
| | 0%
|
|================ | 25%
|
|================================ | 50%
|
|================================================= | 75%
|
|=================================================================| 100%
plotTSNE(sceset, colour_by = "initial.cluster")
## Warning: 'add_ticks' is deprecated.
## Use '+ geom_rug(...)' instead.
p1 <- plotTSNE(sceset[,sceset$source2 == "fresh"], colour_by = "initial.cluster")
## Warning: 'add_ticks' is deprecated.
## Use '+ geom_rug(...)' instead.
p2 <- plotTSNE(sceset[,sceset$source2 == "cryo"], colour_by = "initial.cluster")
## Warning: 'add_ticks' is deprecated.
## Use '+ geom_rug(...)' instead.
p3 <- plotTSNE(sceset[,sceset$source2 == "longC"], colour_by = "initial.cluster")
## Warning: 'add_ticks' is deprecated.
## Use '+ geom_rug(...)' instead.
p4 <- plotTSNE(sceset[,sceset$source2 == "onC"], colour_by = "initial.cluster")
## Warning: 'add_ticks' is deprecated.
## Use '+ geom_rug(...)' instead.
cowplot::plot_grid(p1,p2,p3,p4)
First we remove lowly expressed genes. 18099 genes are kept for DE analysis
matrix <- expm1(logcounts(sceset))
keep <- rowSums(matrix > 1) > 5
sum(keep)
## [1] 18099
dge <- edgeR::DGEList(counts = matrix[keep,]) # make a edgeR object
rm(matrix,keep)
sceset@colData$initial.cluster <- gsub(pattern = " ", replacement = "_", x = sceset@colData$initial.cluster)
sceset@colData$initial.cluster <- gsub(pattern = "-", replacement = "_", x = sceset@colData$initial.cluster)
design <- model.matrix(~ 0 + initial.cluster, data = sceset@colData) # Use 0 because we do not need intercept for this linear model
colnames(design)
## [1] "initial.clustercryo.1" "initial.clustercryo.2"
## [3] "initial.clustercryo.3" "initial.clustercryo.4"
## [5] "initial.clustercryo.5" "initial.clustercryo.6"
## [7] "initial.clusterfresh.1" "initial.clusterfresh.2"
## [9] "initial.clusterfresh.3" "initial.clusterfresh.4"
## [11] "initial.clusterfresh.5" "initial.clusterfresh.6"
## [13] "initial.clusterlongC.1" "initial.clusterlongC.2"
## [15] "initial.clusterlongC.3" "initial.clusterlongC.4"
## [17] "initial.clusterlongC.5" "initial.clusterlongC.6"
## [19] "initial.clusteronC.1" "initial.clusteronC.2"
## [21] "initial.clusteronC.3" "initial.clusteronC.4"
## [23] "initial.clusteronC.5" "initial.clusteronC.6"
v <- voom(dge, design, plot = F)
fit <- lmFit(v, design)
initial.clusters <- colnames(design)
nc <- ncol(design)
## Automating makeContrasts call in limma
contrast_all <- gtools::permutations(v = initial.clusters, n = nc, r = 2)
contrast_all <- apply(contrast_all, MARGIN = 1, function(x) return(paste(x[1],"-",x[2], sep = "")))
cont.matrix <- makeContrasts(contrasts = contrast_all,
levels=design)
fit2 <- contrasts.fit(fit, cont.matrix)
fit2 <- eBayes(fit2)
n_deg <- matrix(0, ncol = nc, nrow = nc) # number of DE genes
colnames(n_deg) <- rownames(n_deg) <- gsub(x = colnames(design)[1:nc], pattern = "initial.cluster",replacement = "")
logcount <- logcounts(sceset)[rownames(sceset) %in% rownames(dge),]
for(i in 1:(nc-1)) {
for(j in (i+1):nc) {
if(i == j) {
n_deg[i,j] <- 0
} else if (j < i) {
coef_k = (i-1)*(nc-1)+j
} else if (j > i) {
coef_k = (i-1)*(nc-1)+j-1
}
if(i != j) {
rls <- topTable(fit2, n = Inf, coef = coef_k, sort = "p", lfc = 0.6, p = 0.05 )
if(nrow(rls) > 1) {
v_expr <- logcount[match(rownames(rls),rownames(logcount)), sceset$initial.cluster == rownames(n_deg)[i]]
rls$ratio1 <- rowSums(v_expr > 0.5)/ncol(v_expr)
v_expr <- logcount[match(rownames(rls),rownames(logcount)), sceset$initial.cluster == colnames(n_deg)[j]]
rls$ratio2 <- rowSums(v_expr > 0.5)/ncol(v_expr)
rls$ratiomax <- rowMaxs(as.matrix(rls[,c("ratio1", "ratio2")]))
rls$ratiomin <- rowMins(as.matrix(rls[,c("ratio1", "ratio2")]))
rls <- rls[rls$ratiomax > 0.25, ]
n_deg[i,j] <- sum(apply(rls, MARGIN = 1, function(x) return(abs(x[1]) * (x[9]+0.01)/(x[10]+0.01)))) ## 0.01 is used here to enhance the differences of on-off genes
} else if (nrow(rls) == 1) {
n_deg[i,j] <- sum(rls$logFC)
}
## This eqaution take fold change and expression ratio into account
}
}
}
n_deg <- n_deg + t(n_deg)
hc <- hclust(as.dist(n_deg))
plot(hc); rect.hclust(hc, k = 12, border = "red")
hc.cluster <- cutree(hc, k = 12)
colData(sceset)$clincluster <- hc.cluster[match(colData(sceset)$initial.cluster, names(hc.cluster))]
colData(sceset)$clincluster <- as.factor(colData(sceset)$clincluster)
table(colData(sceset)$clincluster)
##
## 1 2 3 4 5 6 7 8 9 10 11 12
## 73 166 404 155 444 186 1097 890 145 73 73 171
plotTSNE(sceset, colour_by = "clincluster")
## Warning: 'add_ticks' is deprecated.
## Use '+ geom_rug(...)' instead.
matrix <- expm1(logcounts(sceset))
keep <- rowSums(matrix > 1) > 5
dge <- edgeR::DGEList(counts = matrix[keep,]) # make a edgeR object
logcount <- logcounts(sceset)[rownames(sceset) %in% rownames(dge),]
markers <- c()
# pb <- txtProgressBar(min = 0, max = (length(unique(sceset$clincluster))), style = 3)
for(i in 1:length(unique(sceset$clincluster))){
info <- rep("control", ncol(sceset))
info[sceset$clincluster == i] <- "group"
design <- model.matrix(~ 0 + info)
v <- voom(dge, design, plot = F)
fit <- lmFit(v, design) # Linear Model for Series of Arrays
cont.matrix <- makeContrasts(contrasts = "infogroup-infocontrol",levels=design)
fit <- contrasts.fit(fit, cont.matrix ) # Linear Model for Series of Arrays
fit <- eBayes(fit)
marker <- topTable(fit, p.value = 0.05, number = Inf, coef = 1, lfc = 0.6, sort.by = "logFC")
marker <- marker[marker$logFC > 0.6,]
v_expr <- logcount[match(rownames(marker),rownames(logcount)), info == "group"]
marker$ratio1 <- rowSums(v_expr > 0.5)/ncol(v_expr)
v_expr <- logcount[match(rownames(marker),rownames(logcount)),info != "group"]
marker$ratio2 <- rowSums(v_expr > 0.5)/ncol(v_expr)
marker$gene <- rownames(marker)
marker$cluster <- i
markers <- rbind(markers, marker)
# setTxtProgressBar(pb, i)
}
# close(pb)
markers$cluster <- factor(markers$cluster)
top10 <- markers %>% group_by(cluster) %>% top_n(10, logFC)
## Warning: The `printer` argument is deprecated as of rlang 0.3.0.
## This warning is displayed once per session.
kable(top10)
| logFC | AveExpr | t | P.Value | adj.P.Val | B | ratio1 | ratio2 | gene | cluster |
|---|---|---|---|---|---|---|---|---|---|
| 7.493925 | 2.615666 | 17.858006 | 0 | 0 | 144.52501 | 0.9863014 | 0.0386435 | SRGN | 1 |
| 7.185128 | 2.489490 | 17.843414 | 0 | 0 | 144.31111 | 0.9041096 | 0.0084122 | PTPRC | 1 |
| 6.858956 | 2.535946 | 15.108411 | 0 | 0 | 102.75485 | 0.8904110 | 0.0207676 | LAPTM5 | 1 |
| 6.796910 | 3.642329 | 15.693876 | 0 | 0 | 111.10035 | 0.8767123 | 0.2076761 | CXCR4 | 1 |
| 6.165879 | 2.472423 | 16.956048 | 0 | 0 | 130.22263 | 0.8493151 | 0.0113039 | CD53 | 1 |
| 5.855646 | 2.469413 | 14.386742 | 0 | 0 | 92.83496 | 0.7534247 | 0.0105152 | CD96 | 1 |
| 5.416764 | 3.347757 | 13.304994 | 0 | 0 | 78.71642 | 0.8356164 | 0.1713985 | LCP1 | 1 |
| 5.393131 | 2.499700 | 11.885707 | 0 | 0 | 61.76532 | 0.7534247 | 0.0186646 | LSP1 | 1 |
| 5.183070 | 4.394711 | 13.543537 | 0 | 0 | 81.71635 | 0.9178082 | 0.3845952 | ARHGDIB | 1 |
| 4.992156 | 3.341753 | 15.257743 | 0 | 0 | 104.90180 | 0.9041096 | 0.2478970 | STK17B | 1 |
| 6.763069 | 3.501691 | 20.209524 | 0 | 0 | 183.56233 | 0.9457831 | 0.1374293 | TFF3 | 2 |
| 5.365099 | 4.266078 | 20.635799 | 0 | 0 | 191.30479 | 0.9638554 | 0.3435732 | GOLM1 | 2 |
| 5.346003 | 3.639632 | 16.897211 | 0 | 0 | 128.48156 | 0.8855422 | 0.1616815 | PIFO | 2 |
| 5.139665 | 3.262308 | 16.772869 | 0 | 0 | 126.64423 | 0.9879518 | 0.1126381 | C1orf194 | 2 |
| 4.978251 | 5.165588 | 16.358361 | 0 | 0 | 120.14656 | 0.9277108 | 0.4217192 | MAP1B | 2 |
| 4.910626 | 3.551296 | 18.760309 | 0 | 0 | 158.62713 | 0.9156627 | 0.2023713 | PPIL6 | 2 |
| 4.747680 | 3.260331 | 14.960418 | 0 | 0 | 100.19840 | 0.9337349 | 0.1005120 | FAM183A | 2 |
| 4.682733 | 3.279713 | 14.843187 | 0 | 0 | 98.58025 | 0.8614458 | 0.1037456 | MS4A8B | 2 |
| 4.450050 | 3.206303 | 14.289734 | 0 | 0 | 91.10446 | 0.9337349 | 0.0959310 | TMEM190 | 2 |
| 4.418859 | 3.190226 | 15.902213 | 0 | 0 | 113.67290 | 0.8493976 | 0.1172191 | ARMC3 | 2 |
| 3.156161 | 3.182085 | 20.805783 | 0 | 0 | 197.00896 | 0.5321782 | 0.1238123 | GCM1 | 3 |
| 2.303796 | 5.898201 | 16.302024 | 0 | 0 | 120.64638 | 0.7920792 | 0.7014109 | RSRC2 | 3 |
| 2.115385 | 6.325972 | 12.415795 | 0 | 0 | 67.98545 | 0.7178218 | 0.6677224 | CLK1 | 3 |
| 1.671212 | 5.239914 | 9.946290 | 0 | 0 | 41.36550 | 0.5891089 | 0.5050389 | HBP1 | 3 |
| 1.571460 | 4.342570 | 9.994565 | 0 | 0 | 41.81005 | 0.5173267 | 0.3723006 | DDIT3 | 3 |
| 1.520377 | 3.535779 | 11.644425 | 0 | 0 | 59.03127 | 0.4603960 | 0.2732508 | MXD1 | 3 |
| 1.453073 | 6.663646 | 8.603633 | 0 | 0 | 29.26617 | 0.7376238 | 0.6778002 | STIP1 | 3 |
| 1.450149 | 3.993107 | 9.392428 | 0 | 0 | 36.11889 | 0.4257426 | 0.3031961 | KANSL2 | 3 |
| 1.433712 | 8.256008 | 13.794616 | 0 | 0 | 85.19311 | 0.9678218 | 0.9386697 | RBM39 | 3 |
| 1.380918 | 6.212063 | 9.652074 | 0 | 0 | 38.57855 | 0.7277228 | 0.7293406 | GPBP1 | 3 |
| 8.280606 | 3.501691 | 24.781825 | 0 | 0 | 272.35608 | 0.9419355 | 0.1399785 | TFF3 | 4 |
| 7.432450 | 3.260331 | 22.944057 | 0 | 0 | 235.18380 | 0.9935484 | 0.1004836 | FAM183A | 4 |
| 7.329172 | 3.262308 | 23.949950 | 0 | 0 | 255.40654 | 0.9870968 | 0.1152606 | C1orf194 | 4 |
| 7.205601 | 4.794764 | 22.740933 | 0 | 0 | 230.89815 | 0.9806452 | 0.3707684 | AGR2 | 4 |
| 6.949645 | 3.206303 | 21.782556 | 0 | 0 | 212.75857 | 0.9677419 | 0.0969909 | TMEM190 | 4 |
| 6.845368 | 3.242445 | 21.053827 | 0 | 0 | 199.10222 | 0.9225806 | 0.1002149 | C9orf24 | 4 |
| 6.814381 | 3.229279 | 20.613117 | 0 | 0 | 191.02397 | 0.9419355 | 0.0956475 | C20orf85 | 4 |
| 6.796722 | 3.279713 | 21.082596 | 0 | 0 | 199.63412 | 0.8903226 | 0.1047824 | MS4A8B | 4 |
| 6.442215 | 4.799370 | 21.979126 | 0 | 0 | 216.26256 | 0.9935484 | 0.4032778 | AGR3 | 4 |
| 6.434406 | 3.543655 | 22.013678 | 0 | 0 | 217.17336 | 0.9354839 | 0.1794734 | RSPH1 | 4 |
| 4.330928 | 3.934455 | 23.471224 | 0 | 0 | 249.33460 | 0.8040541 | 0.1706962 | PLAU | 5 |
| 4.303066 | 6.004591 | 24.369896 | 0 | 0 | 267.94182 | 0.9707207 | 0.5219924 | ANXA3 | 5 |
| 4.258083 | 4.457910 | 26.516998 | 0 | 0 | 314.65169 | 0.9414414 | 0.3049811 | TNFRSF21 | 5 |
| 4.223105 | 4.517445 | 26.482070 | 0 | 0 | 313.86835 | 0.9527027 | 0.3233324 | CDCP1 | 5 |
| 4.059569 | 5.414029 | 22.942299 | 0 | 0 | 238.55217 | 0.9707207 | 0.4276143 | MARS | 5 |
| 4.015646 | 7.841194 | 22.113260 | 0 | 0 | 222.04860 | 0.9932432 | 0.7325954 | LAMC2 | 5 |
| 3.972775 | 4.722760 | 22.913576 | 0 | 0 | 237.99298 | 0.8941441 | 0.3279930 | TUBB6 | 5 |
| 3.843969 | 3.838613 | 25.343625 | 0 | 0 | 288.84202 | 0.8445946 | 0.2175939 | TGFA | 5 |
| 3.792899 | 5.219056 | 24.541175 | 0 | 0 | 271.59277 | 0.9617117 | 0.4756773 | ADAM9 | 5 |
| 3.759404 | 5.165588 | 20.356696 | 0 | 0 | 188.79738 | 0.9324324 | 0.3801340 | MAP1B | 5 |
| 6.511838 | 3.133207 | 26.316379 | 0 | 0 | 309.42243 | 0.8494624 | 0.1110810 | SERPINE1 | 6 |
| 4.943719 | 5.995885 | 14.628581 | 0 | 0 | 96.36281 | 0.7634409 | 0.4760228 | IL8 | 6 |
| 4.637100 | 3.611697 | 17.940146 | 0 | 0 | 146.52447 | 0.7204301 | 0.1964237 | GEM | 6 |
| 4.510466 | 4.165234 | 19.214042 | 0 | 0 | 168.10209 | 0.9408602 | 0.3606069 | LGALS1 | 6 |
| 4.155748 | 3.990554 | 17.780649 | 0 | 0 | 143.90752 | 0.8064516 | 0.3102140 | PHLDA1 | 6 |
| 4.042896 | 5.534019 | 15.975231 | 0 | 0 | 115.67622 | 0.8387097 | 0.5429423 | FOSL1 | 6 |
| 3.967744 | 4.992061 | 16.245861 | 0 | 0 | 119.74555 | 0.8387097 | 0.4584124 | MPZL1 | 6 |
| 3.865202 | 2.732882 | 14.882151 | 0 | 0 | 99.92355 | 0.5591398 | 0.0392847 | DCN | 6 |
| 3.748799 | 6.652151 | 15.465753 | 0 | 0 | 108.18656 | 0.9677419 | 0.7174208 | UBB | 6 |
| 3.677978 | 2.677949 | 13.619427 | 0 | 0 | 82.91723 | 0.5698925 | 0.0349499 | MEG3 | 6 |
| 4.131843 | 8.525154 | 25.670314 | 0 | 0 | 295.82112 | 0.9453054 | 0.5845324 | FOS | 7 |
| 4.127485 | 7.073289 | 29.612023 | 0 | 0 | 386.50868 | 0.9288970 | 0.4798561 | NR4A1 | 7 |
| 3.621659 | 7.930901 | 29.676534 | 0 | 0 | 388.02761 | 0.9908842 | 0.6863309 | EGR1 | 7 |
| 3.189548 | 8.979067 | 21.394827 | 0 | 0 | 208.17844 | 0.9307201 | 0.6787770 | CRISP3 | 7 |
| 3.137517 | 6.986051 | 27.338675 | 0 | 0 | 333.18502 | 0.9197812 | 0.6143885 | C1orf63 | 7 |
| 3.042969 | 8.503664 | 23.229930 | 0 | 0 | 244.33647 | 0.9653601 | 0.6791367 | MSLN | 7 |
| 3.027996 | 5.597053 | 28.420278 | 0 | 0 | 358.29761 | 0.9033728 | 0.4316547 | FOSB | 7 |
| 2.778039 | 6.560015 | 19.815126 | 0 | 0 | 178.97554 | 0.8012762 | 0.4881295 | CXCL2 | 7 |
| 2.715983 | 6.139232 | 22.494649 | 0 | 0 | 229.62501 | 0.8204193 | 0.4946043 | BTG2 | 7 |
| 2.670104 | 6.695662 | 23.356205 | 0 | 0 | 246.95068 | 0.8869644 | 0.6046763 | NCOA7 | 7 |
| 4.065131 | 7.073289 | 26.390556 | 0 | 0 | 311.95681 | 0.9797753 | 0.4958152 | NR4A1 | 8 |
| 4.023885 | 8.525154 | 22.817688 | 0 | 0 | 236.12411 | 0.9876404 | 0.5969200 | FOS | 8 |
| 3.930515 | 6.974367 | 29.084374 | 0 | 0 | 374.15095 | 0.9640449 | 0.5544024 | C3 | 8 |
| 3.787296 | 8.180191 | 26.128197 | 0 | 0 | 306.10553 | 0.9719101 | 0.6548376 | HLA-DRA | 8 |
| 3.772599 | 6.560015 | 25.203502 | 0 | 0 | 285.88876 | 0.9224719 | 0.4737195 | CXCL2 | 8 |
| 3.715153 | 6.175397 | 29.730715 | 0 | 0 | 389.68391 | 0.9494382 | 0.4974891 | ZC3H12A | 8 |
| 3.681647 | 8.979067 | 23.003542 | 0 | 0 | 239.86334 | 0.9595506 | 0.6876465 | CRISP3 | 8 |
| 3.585745 | 4.803819 | 26.901401 | 0 | 0 | 323.49267 | 0.7370787 | 0.2597924 | CHI3L1 | 8 |
| 3.436071 | 7.803568 | 26.646823 | 0 | 0 | 317.68884 | 0.9719101 | 0.7037161 | HLA-DRB1 | 8 |
| 3.427348 | 5.929517 | 24.216766 | 0 | 0 | 264.87824 | 0.8022472 | 0.4218279 | HLA-DPA1 | 8 |
| 9.503120 | 3.603942 | 28.493126 | 0 | 0 | 359.30944 | 1.0000000 | 0.1623794 | TPPP3 | 9 |
| 8.142990 | 3.229279 | 23.648591 | 0 | 0 | 252.64510 | 1.0000000 | 0.0956592 | C20orf85 | 9 |
| 8.122980 | 3.100361 | 23.610057 | 0 | 0 | 251.86028 | 1.0000000 | 0.0731511 | ZMYND10 | 9 |
| 7.994897 | 3.639632 | 23.455852 | 0 | 0 | 248.68118 | 1.0000000 | 0.1613076 | PIFO | 9 |
| 7.704140 | 3.242445 | 22.614286 | 0 | 0 | 231.74689 | 0.9724138 | 0.1004823 | C9orf24 | 9 |
| 7.623226 | 2.959397 | 22.185386 | 0 | 0 | 223.29851 | 1.0000000 | 0.0568060 | SNTN | 9 |
| 7.557494 | 4.474227 | 24.482443 | 0 | 0 | 269.96994 | 1.0000000 | 0.3494105 | CAPS | 9 |
| 7.539040 | 2.973098 | 22.753646 | 0 | 0 | 234.53992 | 0.9862069 | 0.0632369 | LDLRAD1 | 9 |
| 7.373327 | 3.260331 | 21.445322 | 0 | 0 | 208.98802 | 1.0000000 | 0.1026259 | FAM183A | 9 |
| 7.146212 | 3.206303 | 21.374811 | 0 | 0 | 207.65254 | 0.9586207 | 0.0996785 | TMEM190 | 9 |
| 6.075956 | 4.165234 | 15.430737 | 0 | 0 | 104.81184 | 1.0000000 | 0.3767087 | LGALS1 | 10 |
| 6.073755 | 4.020645 | 13.634885 | 0 | 0 | 81.06973 | 0.9452055 | 0.2665615 | CCNA1 | 10 |
| 5.948681 | 5.203586 | 12.488000 | 0 | 0 | 67.17488 | 0.9863014 | 0.4379600 | TGM2 | 10 |
| 5.919642 | 3.419420 | 13.884339 | 0 | 0 | 84.32173 | 0.8767123 | 0.1874343 | UCA1 | 10 |
| 5.689158 | 2.740400 | 14.781466 | 0 | 0 | 96.23187 | 0.9863014 | 0.0686120 | SCD | 10 |
| 5.647768 | 3.122154 | 13.466814 | 0 | 0 | 79.15995 | 0.9041096 | 0.1282860 | WNT7A | 10 |
| 5.534420 | 3.377337 | 13.419972 | 0 | 0 | 78.55696 | 0.9041096 | 0.1845426 | MFAP5 | 10 |
| 5.526872 | 4.079079 | 12.339285 | 0 | 0 | 65.62422 | 0.9726027 | 0.2789169 | ALDH1A3 | 10 |
| 5.460513 | 3.761994 | 14.160684 | 0 | 0 | 87.87904 | 0.9863014 | 0.2773396 | ACAT2 | 10 |
| 5.406879 | 3.560699 | 12.870324 | 0 | 0 | 71.88712 | 0.9863014 | 0.2003155 | DHCR7 | 10 |
| 5.603419 | 4.165234 | 14.436834 | 0 | 0 | 93.06120 | 0.9726027 | 0.3772345 | LGALS1 | 11 |
| 5.492215 | 3.377337 | 13.536047 | 0 | 0 | 81.33709 | 0.8082192 | 0.1863828 | MFAP5 | 11 |
| 5.198020 | 5.828861 | 9.256727 | 0 | 0 | 34.99960 | 0.9178082 | 0.4277077 | KRT17 | 11 |
| 5.131380 | 3.419420 | 12.247042 | 0 | 0 | 65.69621 | 0.8082192 | 0.1887487 | UCA1 | 11 |
| 5.087037 | 5.203586 | 10.890454 | 0 | 0 | 50.74487 | 0.8904110 | 0.4398002 | TGM2 | 11 |
| 5.076277 | 3.721301 | 11.900766 | 0 | 0 | 61.72685 | 0.8356164 | 0.2360673 | TAGLN | 11 |
| 4.859095 | 8.297745 | 10.459261 | 0 | 0 | 46.30541 | 1.0000000 | 0.7549947 | KRT7 | 11 |
| 4.743550 | 3.680273 | 13.298494 | 0 | 0 | 78.36388 | 0.8767123 | 0.2939012 | C12orf75 | 11 |
| 4.446859 | 6.174009 | 9.620204 | 0 | 0 | 38.29411 | 0.9041096 | 0.5675605 | KLK10 | 11 |
| 4.445222 | 3.162950 | 10.388417 | 0 | 0 | 45.70811 | 0.6849315 | 0.1280231 | KRT6A | 11 |
| 3.834264 | 5.177970 | 13.740327 | 0 | 0 | 81.99544 | 0.8771930 | 0.4492715 | SLC1A5 | 12 |
| 3.775817 | 5.414029 | 13.205823 | 0 | 0 | 75.39135 | 0.9064327 | 0.4705882 | MARS | 12 |
| 3.747142 | 5.288900 | 13.673544 | 0 | 0 | 81.15467 | 0.9239766 | 0.4749056 | GARS | 12 |
| 3.577583 | 5.351104 | 12.688685 | 0 | 0 | 69.29353 | 0.8830409 | 0.4490016 | AXL | 12 |
| 3.534286 | 3.613653 | 13.709260 | 0 | 0 | 81.97845 | 0.7368421 | 0.2115488 | FSTL3 | 12 |
| 3.498838 | 6.432474 | 11.706103 | 0 | 0 | 58.14422 | 0.9415205 | 0.5866163 | LAMB3 | 12 |
| 3.483771 | 6.371894 | 14.713662 | 0 | 0 | 94.39811 | 0.9766082 | 0.7139773 | HMGA1 | 12 |
| 3.408765 | 5.010256 | 12.685398 | 0 | 0 | 69.32909 | 0.8888889 | 0.4363195 | DDX39A | 12 |
| 3.394393 | 4.070904 | 15.139305 | 0 | 0 | 100.67303 | 0.8947368 | 0.3664328 | EMP3 | 12 |
| 3.383319 | 4.434041 | 12.948795 | 0 | 0 | 72.53221 | 0.8070175 | 0.3505127 | SHMT2 | 12 |
## used (Mb) gc trigger (Mb) limit (Mb) max used (Mb)
## Ncells 5403100 288.6 8458888 451.8 NA 8458888 451.8
## Vcells 510217535 3892.7 1493200851 11392.3 16384 1493199592 11392.3
markers <- markers[rowMaxs(as.matrix(markers[,c("ratio1","ratio2")])) > 0.4,]
top10 <- markers %>% group_by(cluster) %>% top_n(10, logFC)
plot.data <- logcounts(sceset)[top10$gene, order(sceset$clincluster, decreasing = F)]
colanno <- data.frame (colData(sceset)[,c("clincluster","Patient2")])
colnames(colanno)[1] <- "clusters"
colanno$clusters <- factor(colanno$clusters)
rownames(colanno) <- colnames(sceset)
colanno <- colanno[order(colanno$clusters, decreasing = F),]
colanno$clusters <- factor(colanno$clusters, levels = unique(colanno$clusters))
plot.data <- plot.data[,match(rownames(colanno), colnames(plot.data))]
plot.data <- t(scale(t(plot.data), center = T, scale = T))
plot.data <- Seurat::MinMax(plot.data, min = -2, max = 2)
plot.data<- as.data.frame(x = t(x = plot.data))
plot.data$cell <- rownames(x = plot.data)
cells.ident <- sceset$clincluster
names(x = cells.ident) <- sceset$Sample
colnames(x = plot.data) <- make.unique(names = colnames(x = plot.data))
plot.data %>% melt(id.vars = "cell") -> plot.data
names(x = plot.data)[names(x = plot.data) == 'variable'] <- 'gene'
names(x = plot.data)[names(x = plot.data) == 'value'] <- 'expression'
plot.data$ident <- cells.ident[plot.data$cell]
plot.data$gene <- with(
data = plot.data,
expr = factor(x = gene, levels = rev(x = unique(x = plot.data$gene)))
)
plot.data$cell <- with(
data = plot.data,
expr = factor(x = cell, levels = unique(x = colnames(sceset)))
)
my_colours <- colorRampPalette(c("steelblue4", "white", "firebrick2"))(200)
heatmap <- ggplot( data = plot.data, mapping = aes(x = cell, y = gene, fill = expression)) + geom_tile() +
scale_fill_gradient2(
# low = muted("blue"), mid = "white", high = muted("red")
low = muted("steelblue4"), mid = "white",
high = muted("firebrick2"),
name= "Expression", guide = guide_colorbar(
direction = "vertical",
title.position = "top"
)
) +
scale_y_discrete(position = "right", labels = rev(top10$gene)) +
theme(
axis.line = element_blank(),
axis.title.y = element_blank(),
axis.ticks.y = element_blank(),
strip.text.x = element_text(size = 15),
axis.text.y = element_text(size = 6),
axis.text.x = element_text(size = 6),
axis.title.x = element_blank()
)
heatmap <- heatmap +
facet_grid(
facets = ~ident,
drop = TRUE,
space = "free",
scales = "free",
switch = 'x'
) +
scale_x_discrete(expand = c(0, 0), drop = TRUE) +
theme(
axis.title.x = element_blank(),
axis.text.x = element_blank(),
axis.ticks.x = element_blank(),
axis.line = element_blank(),
axis.title.y = element_blank(),
axis.ticks.y = element_blank()
)
panel.spacing <- unit(x = 0.15, units = 'lines')
heatmap <- heatmap +
theme(strip.background = element_blank(), panel.spacing = panel.spacing)
heatmap
# saveRDS(sceset,"../rds/20180917Sceset_12clusters.rds", compress = T)
# write.csv(markers,"../tables/20180917sceset_markers_12cluster.csv", row.names = T)
current.cluster.ids <- c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11 ,12)
new.cluster.ids <- c("Leukocyte",
"Cultured ciliated",
"O.N. cultured FTESCs",
"Cultured ciliated",
"O.N. cultured FTESCs",
"Stromal cells",
"Fresh FTESCs",
"Fresh FTESCs",
"Fresh ciliated",
"Long cultured FTESCs 2",
"Long cultured FTESCs 1",
"O.N. cultured FTESCs")
sceset$ident <- plyr::mapvalues(x = sceset$clincluster, from = current.cluster.ids, to = new.cluster.ids)
sceset$ident <- factor(sceset$ident,
levels = c("Fresh FTESCs",
"Fresh ciliated",
"O.N. cultured FTESCs",
"Cultured ciliated",
"Long cultured FTESCs 1",
"Long cultured FTESCs 2",
"Leukocyte",
"Stromal cells"))
sceset$sources <- sceset$source
sceset$sources[sceset$sources %in% c("2-day cultured", "6-day cultured")] <- "Long cultured"
sceset$sources <- factor(sceset$sources, levels = c("Fresh","O.N. cultured","cryopreserved",
"Long cultured"))
plotTSNE(object = sceset, colour_by = "ident")
## Warning: 'add_ticks' is deprecated.
## Use '+ geom_rug(...)' instead.
t-SNE plot profiles ~3,600 single-cell transcriptome from fallopian tubes coloured by patients
plotTSNE(sceset, colour_by = "Patient2")
## Warning: 'add_ticks' is deprecated.
## Use '+ geom_rug(...)' instead.
# ggsave("../manuscript_plots/FigureS1A_tsne.tiff", res = 300, width = 7, height = 5.5, units = "in")
Expression plot of secretory markers PAX8, KART7, ciliated markers CAPS and CCDC17.
p1 <- plotTSNE(sceset, colour_by = "EPCAM")
## Warning: 'add_ticks' is deprecated.
## Use '+ geom_rug(...)' instead.
p2 <- plotTSNE(sceset, colour_by = "PTPRC")
## Warning: 'add_ticks' is deprecated.
## Use '+ geom_rug(...)' instead.
p3 <- plotTSNE(sceset, colour_by = "PAX8")
## Warning: 'add_ticks' is deprecated.
## Use '+ geom_rug(...)' instead.
p4 <- plotTSNE(sceset, colour_by = "KRT7")
## Warning: 'add_ticks' is deprecated.
## Use '+ geom_rug(...)' instead.
p5 <- plotTSNE(sceset, colour_by = "CAPS")
## Warning: 'add_ticks' is deprecated.
## Use '+ geom_rug(...)' instead.
p6 <- plotTSNE(sceset, colour_by = "CCDC17")
## Warning: 'add_ticks' is deprecated.
## Use '+ geom_rug(...)' instead.
p7 <-plotTSNE(sceset, colour_by = "CCDC78")
## Warning: 'add_ticks' is deprecated.
## Use '+ geom_rug(...)' instead.
cowplot::plot_grid(p1,p2,p3,p4,p5,p6,p7, ncol=4)
fresh <- sceset[,sceset$source == "Fresh"]
fresh$type <- "Secretory"
fresh$type[fresh$ident == "Fresh ciliated"] <- "Ciliated"
plotExpression(fresh, x = "type", features = c("KRT7","PAX8"), ncol = 5,xlab = "Cell type") + theme(strip.text = element_text(size = 12, face = "italic") )
# ggsave("plots/SuppFig1H_secretory_markers.png", height = 2, width = 3.5)
plotExpression(fresh, x = "type", features = c("CCDC17","CCDC78","CAPS","FOXJ1"), ncol = 5,xlab = "Cell type") + theme(strip.text = element_text(size = 12, face = "italic") )
# ggsave("../../revision_analysis_20190827/revision_plots/SI/SuppFig1I_ciliated_markers_foxj1ADDED.png", height = 2, width = 5.7)
# saveRDS(sceset,"../rds/sce_cancerFT_clustered.rds")
sessionInfo()
## R version 3.5.2 (2018-12-20)
## Platform: x86_64-apple-darwin15.6.0 (64-bit)
## Running under: macOS Mojave 10.14.4
##
## Matrix products: default
## BLAS: /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRblas.0.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRlapack.dylib
##
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
##
## attached base packages:
## [1] parallel stats4 stats graphics grDevices utils datasets
## [8] methods base
##
## other attached packages:
## [1] bindrcpp_0.2.2 cowplot_0.9.4
## [3] knitr_1.21 reshape2_1.4.3
## [5] scales_1.0.0 dplyr_0.7.8
## [7] edgeR_3.24.3 limma_3.38.3
## [9] scater_1.10.1 ggplot2_3.2.0.9000
## [11] SingleCellExperiment_1.4.1 SummarizedExperiment_1.12.0
## [13] DelayedArray_0.8.0 BiocParallel_1.16.5
## [15] matrixStats_0.54.0 Biobase_2.42.0
## [17] GenomicRanges_1.34.0 GenomeInfoDb_1.18.1
## [19] IRanges_2.16.0 S4Vectors_0.20.1
## [21] BiocGenerics_0.28.0
##
## loaded via a namespace (and not attached):
## [1] Seurat_3.0.2 Rtsne_0.15
## [3] ggbeeswarm_0.6.0 colorspace_1.4-0
## [5] ggridges_0.5.1 XVector_0.22.0
## [7] listenv_0.7.0 npsurv_0.4-0
## [9] ggrepel_0.8.0 codetools_0.2-16
## [11] splines_3.5.2 R.methodsS3_1.7.1
## [13] lsei_1.2-0 jsonlite_1.6
## [15] ica_1.0-2 cluster_2.0.7-1
## [17] png_0.1-7 R.oo_1.22.0
## [19] sctransform_0.2.0 HDF5Array_1.10.1
## [21] compiler_3.5.2 httr_1.4.0
## [23] assertthat_0.2.0 Matrix_1.2-15
## [25] lazyeval_0.2.1 htmltools_0.3.6
## [27] tools_3.5.2 rsvd_1.0.2
## [29] igraph_1.2.3 gtable_0.2.0
## [31] glue_1.3.0 GenomeInfoDbData_1.2.0
## [33] RANN_2.6.1 Rcpp_1.0.0
## [35] gdata_2.18.0 ape_5.2
## [37] nlme_3.1-137 DelayedMatrixStats_1.4.0
## [39] gbRd_0.4-11 lmtest_0.9-36
## [41] xfun_0.4 stringr_1.4.0
## [43] globals_0.12.4 irlba_2.3.3
## [45] kknn_1.3.1 gtools_3.8.1
## [47] future_1.14.0 zlibbioc_1.28.0
## [49] MASS_7.3-51.1 zoo_1.8-4
## [51] rhdf5_2.26.2 RColorBrewer_1.1-2
## [53] yaml_2.2.0 reticulate_1.10
## [55] pbapply_1.4-0 gridExtra_2.3
## [57] stringi_1.2.4 highr_0.7
## [59] caTools_1.17.1.1 bibtex_0.4.2
## [61] Rdpack_0.10-1 SDMTools_1.1-221
## [63] rlang_0.4.0 pkgconfig_2.0.2
## [65] bitops_1.0-6 evaluate_0.13
## [67] lattice_0.20-38 ROCR_1.0-7
## [69] purrr_0.3.0 Rhdf5lib_1.4.2
## [71] bindr_0.1.1 htmlwidgets_1.3
## [73] labeling_0.3 tidyselect_0.2.5
## [75] plyr_1.8.4 magrittr_1.5
## [77] R6_2.3.0 gplots_3.0.1.1
## [79] pillar_1.3.1 withr_2.1.2
## [81] fitdistrplus_1.0-14 survival_2.43-3
## [83] RCurl_1.95-4.11 tsne_0.1-3
## [85] tibble_2.0.1 future.apply_1.3.0
## [87] crayon_1.3.4 KernSmooth_2.23-15
## [89] plotly_4.8.0 rmarkdown_1.11
## [91] viridis_0.5.1 locfit_1.5-9.1
## [93] grid_3.5.2 data.table_1.12.0
## [95] metap_1.1 digest_0.6.18
## [97] tidyr_0.8.2 R.utils_2.7.0
## [99] munsell_0.5.0 beeswarm_0.2.3
## [101] viridisLite_0.3.0 vipor_0.4.5
R version 3.5.2 (2018-12-20) Platform: x86_64-apple-darwin15.6.0 (64-bit) Running under: macOS Mojave 10.14.4
Matrix products: default BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib LAPACK: /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRlapack.dylib
locale: [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages: [1] grid parallel stats4 stats graphics grDevices utils datasets methods base
other attached packages: [1] modes_0.7.0 ROCR_1.0-7 gplots_3.0.1.1 KernSmooth_2.23-15 fields_9.6 maps_3.3.0
[7] spam_2.2-2 dotCall64_1.0-0 DoubletFinder_2.0.1 Seurat_2.3.4 Matrix_1.2-15 cowplot_0.9.4
[13] bindrcpp_0.2.2 reshape2_1.4.3 scales_1.0.0 dplyr_0.7.8 edgeR_3.24.3 limma_3.38.3
[19] scater_1.10.1 ggplot2_3.1.0 SingleCellExperiment_1.4.1 SummarizedExperiment_1.12.0 DelayedArray_0.8.0 BiocParallel_1.16.5
[25] matrixStats_0.54.0 Biobase_2.42.0 GenomicRanges_1.34.0 GenomeInfoDb_1.18.1 IRanges_2.16.0 S4Vectors_0.20.1
[31] BiocGenerics_0.28.0
loaded via a namespace (and not attached): [1] reticulate_1.10 R.utils_2.7.0 tidyselect_0.2.5 htmlwidgets_1.3 trimcluster_0.1-2.1 Rtsne_0.15 devtools_2.0.1
[8] munsell_0.5.0 codetools_0.2-16 ica_1.0-2 statmod_1.4.30 scran_1.10.2 umap_0.1.0.3 withr_2.1.2
[15] colorspace_1.4-0 knitr_1.21 rstudioapi_0.9.0 robustbase_0.93-3 dtw_1.20-1 gbRd_0.4-11 Rdpack_0.10-1
[22] labeling_0.3 lars_1.2 GenomeInfoDbData_1.2.0 pheatmap_1.0.12 bit64_0.9-7 rhdf5_2.26.2 rprojroot_1.3-2
[29] xfun_0.4 diptest_0.75-7 R6_2.3.0 ggbeeswarm_0.6.0 locfit_1.5-9.1 hdf5r_1.0.1 flexmix_2.3-14
[36] bitops_1.0-6 assertthat_0.2.0 SDMTools_1.1-221 nnet_7.3-12 beeswarm_0.2.3 gtable_0.2.0 npsurv_0.4-0
[43] processx_3.2.1 rlang_0.3.1 splines_3.5.2 lazyeval_0.2.1 acepack_1.4.1 checkmate_1.9.1 yaml_2.2.0
[50] backports_1.1.3 Hmisc_4.2-0 tools_3.5.2 usethis_1.4.0 RColorBrewer_1.1-2 proxy_0.4-22 dynamicTreeCut_1.63-1
[57] sessioninfo_1.1.1 ggridges_0.5.1 kknn_1.3.1 Rcpp_1.0.0 plyr_1.8.4 base64enc_0.1-3 zlibbioc_1.28.0
[64] purrr_0.3.0 RCurl_1.95-4.11 ps_1.3.0 prettyunits_1.0.2 rpart_4.1-13 pbapply_1.4-0 viridis_0.5.1
[71] zoo_1.8-4 cluster_2.0.7-1 fs_1.2.6 magrittr_1.5 data.table_1.12.0 lmtest_0.9-36 RANN_2.6.1
[78] mvtnorm_1.0-8 fitdistrplus_1.0-14 pkgload_1.0.2 evaluate_0.13 lsei_1.2-0 mclust_5.4.2 gridExtra_2.3
[85] compiler_3.5.2 tibble_2.0.1 crayon_1.3.4 R.oo_1.22.0 htmltools_0.3.6 segmented_0.5-3.0 Formula_1.2-3
[92] snow_0.4-3 tidyr_0.8.2 MASS_7.3-51.1 fpc_2.1-11.1 cli_1.0.1 R.methodsS3_1.7.1 gdata_2.18.0
[99] metap_1.1 bindr_0.1.1 igraph_1.2.3 pkgconfig_2.0.2 foreign_0.8-71 foreach_1.4.4 vipor_0.4.5
[106] XVector_0.22.0 bibtex_0.4.2 stringr_1.4.0 callr_3.1.1 digest_0.6.18 tsne_0.1-3 rmarkdown_1.11
[113] htmlTable_1.13.1 DelayedMatrixStats_1.4.0 curl_3.3 kernlab_0.9-27 gtools_3.8.1 modeltools_0.2-22 nlme_3.1-137
[120] jsonlite_1.6 Rhdf5lib_1.4.2 BiocNeighbors_1.0.0 desc_1.2.0 viridisLite_0.3.0 pillar_1.3.1 lattice_0.20-38
[127] httr_1.4.0 DEoptimR_1.0-8 pkgbuild_1.0.2 survival_2.43-3 glue_1.3.0 remotes_2.0.2 png_0.1-7
[134] prabclus_2.2-7 iterators_1.0.10 bit_1.1-14 class_7.3-15 stringi_1.2.4 HDF5Array_1.10.1 mixtools_1.1.0
[141] doSNOW_1.0.16 latticeExtra_0.6-28 caTools_1.17.1.1 memoise_1.1.0 irlba_2.3.3 ape_5.2